Gamified crowdsourcing for idiom corpora construction

نویسندگان

چکیده

Abstract Learning idiomatic expressions is seen as one of the most challenging stages in second-language learning because their unpredictable meaning. A similar situation holds for identification within natural language processing applications such machine translation and parsing. The lack high-quality usage samples exacerbates this challenge not only humans but also artificial intelligence systems. This article introduces a gamified crowdsourcing approach collecting materials expressions; messaging bot designed an asynchronous multiplayer game native speakers who compete with each other while providing nonidiomatic examples rating players’ entries. As opposed to classical crowd-processing annotation efforts field, first time literature, crowd-creating & crowd-rating implemented tested idiom corpora construction. language-independent evaluated on two languages comparison traditional data preparation techniques field. reaction crowd monitored under different motivational means (namely, gamification affordances monetary rewards). results reveal that proposed powerful targeted materials, although being explicit approach, it found entertaining useful by crowd. has been shown have potential speed up construction be used material, training supervised systems, or lexicographic studies.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interview: Acquiring Corpora using Crowdsourcing

Crowdsourcing has become one of the hottest topics in the artificial intelligence community in recent years. Its application to speech and language processing tasks like speech transcription has been very appealing-but what about creating corpora? Can we harness the power of crowdsourcing to improve training data sets for spoken language processing applications like dialogue systems? project to...

متن کامل

The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy

Crowdsourcing is an increasingly popular, collaborative approach for acquiring annotated corpora. Despite this, reuse of corpus conversion tools and user interfaces between projects is still problematic, since these are not generally made available. This demonstration will introduce the new, open-source GATE Crowdsourcing plugin, which offers infrastructural support for mapping documents to cro...

متن کامل

Construction of an Idiom Corpus and its Application to Idiom Identification based on WSD Incorporating Idiom-Specific Features

Some phrases can be interpreted either idiomatically (figuratively) or literally in context, and the precise identification of idioms is indispensable for full-fledged natural language processing (NLP). To this end, we have constructed an idiom corpus for Japanese. This paper reports on the corpus and the results of an idiom identification experiment using the corpus. The corpus targets 146 amb...

متن کامل

Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing

Recent work has established the efficacy of Amazon’s Mechanical Turk for constructing parallel corpora for machine translation research. We apply this to building a collection of parallel corpora between English and six languages from the Indian subcontinent: Bengali, Hindi, Malayalam, Tamil, Telugu, and Urdu. These languages are low-resource, under-studied, and exhibit linguistic phenomena tha...

متن کامل

Automatic Corpora Construction for Text Classification

Since the machines become more and more intelligent, it is reasonable to expect the automatic construction of text classifiers by given just the objective categories. As trade-off solutions, existing researches usually provide additional information to the category terms to enhance the performance of a classifier. Unique from them, in this paper, we construct the standard corpora from the web b...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Natural Language Engineering

سال: 2022

ISSN: ['1469-8110', '1351-3249']

DOI: https://doi.org/10.1017/s1351324921000401